This directory contains a codebook for all the variables in Wayne and Ying's "Gateways to White Nationalism: Leaders’ Online Rhetoric and Follower Engagement."


*** tweets.csv ***
created_at: Timestamp when tweet was posted
username: Randomly generated user ids. Equivalent to Twitter handle of the author
seedfollow: Number of leaders a follower account followers; "999" for leader accounts. 
follower_count: Number of followers the user has
rts: Number of retweets
replys: Number of replies
likes: Number of likes
quotes: Number of quote tweets
RelevantLogit: Model-predicted probability of tweet relevance
ads_dict: Count of advertisement-related keywords
ntoken: Total number of tokens (words) in tweet
gender_dict: Gender-related keyword count
nationalism_dict: Nationalist-related keyword count
partisan_dict: Partisan keyword count
race_dict: Race or ethnicity-related keyword count
religion_dict: Religion-related keyword count
Benevolent_dict: Benevolent sexism keyword count
Feminism_dict: Feminism-related keyword count
GenderIdentification_dict: Gender identity-related keyword count
General_dict: General gender-topic keyword count
Hostile_dict: Hostile sexism keyword count
ReproductiveRights_dict: Reproductive rights keyword count
SexualOrientation_dict: Sexual orientation-related keyword count
organization: Indicator for tweets from organizational accounts
individual: Indicator for tweets from individual users
overlap: Indicator for the (leader) user also being present on Gab


*** gabs.csv ***
created_at: Timestamp when post was created
username: Randomly generated user IDs; equivalent to Gab handle of the author
seedfollow: Number of leaders a follower account follows; “999” for leader accounts
followers_count: Number of followers the user has
statuses_count: Total number of posts made by the user
reblogs_count: Number of reposts (analogous to retweets)
favourites_count: Number of likes a post received
replies_count: Number of replies to the post
account_created_at: Timestamp when the user’s Gab account was created
RelevantLogit: Model-predicted probability of post relevance
ads_dict: Count of advertisement-related keywords
ntoken: Total number of tokens (words) in post
gender_dict: Gender-related keyword count
nationalism_dict: Nationalist-related keyword count
partisan_dict: Partisan keyword count
race_dict: Race or ethnicity-related keyword count
religion_dict: Religion-related keyword count
Benevolent_dict: Benevolent sexism keyword count
Feminism_dict: Feminism-related keyword count
GenderIdentification_dict: Gender identity-related keyword count
General_dict: General gender-topic keyword count
Hostile_dict: Hostile sexism keyword count
ReproductiveRights_dict: Reproductive rights keyword count
SexualOrientation_dict: Sexual orientation-related keyword count
organization: Indicator for posts from organizational accounts
individual: Indicator for posts from individual users
overlap: Indicator for the (leader) user also being present on Twitter


*** reposttweets.csv/reposttweets7plus.csv ***
created_at, username, RelevantLogit, ads_dict, ntoken, gender_dict, nationalism_dict, partisan_dict, race_dict, religion_dict: Same as in dataset tweets.csv
public_metrics.retweet_count: Number of times the tweet was retweeted
public_metrics.reply_count: Number of replies to the tweet
public_metrics.like_count: Number of likes the tweet received
public_metrics.quote_count: Number of quote tweets
seed: Indicator for whether the post originated from a seed (leader) account
folleader_repost: Number of reposts by core followers
out_repost: Number of reposts by peripheral followers
media: Indicator for whether the tweet includes media (image, video, or link)
public_metrics.followers_count: Number of followers the user has
public_metrics.following_count: Number of accounts the user follows
public_metrics.tweet_count: Total number of tweets the user has posted
public_metrics.listed_count: Number of public lists the user appears on


*** repostgabs.csv/repostgabs7plus.csv ***
username, created_at, reblogs_count, favourites_count, replies_count, account_created_at, followers_count, statuses_count, RelevantLogit, ads_dict, ntoken, gender_dict, nationalism_dict, partisan_dict, race_dict, religion_dict: Same as in dataset gabs.csv
folleader_repost: Number of reposts by core followers
out_repost: Number of reposts by peripheral followers
media: Indicator for whether the Gab post includes media (image, video, or link)
seed: Indicator for whether the post originated from a seed (leader) account


*** followerdistribution.csv ***
leaders_followed: Number of leaders’ accounts a follower user follows
accounts_to_be_scraped: Number of follower accounts meeting the criteria and selected for scraping


*** intercodergab.csv *** 
id: Randomly generated post IDs; equivalent to unique identifier for each Gab post manually coded
coder: Randomly generated user IDs; equivalent to identifier for the human coder who annotated the post
Relevant: Whether the post is relevant to the study topic
Ethnicity.Race: Whether the post contains race- or ethnicity-related content
Ethnicity.Race..key.words.or.phrases.: Key words or phrases indicating race/ethnicity themes
Religion: Whether the post contains religion-related content
Religion..key.words.or.phrases.: Key words or phrases indicating religion themes
Nationalism: Whether the post expresses nationalist content
Nationalism..key.words.or.phrases.: Key words or phrases indicating nationalism themes
Partisan: Whether the post expresses partisan or political alignment content
Partisan..key.words.or.phrases.: Key words or phrases indicating partisan themes
Gender: Whether the post contains gender-related content
Gender..key.words.or.phrases.: Key words or phrases indicating gender themes


*** intercodertwitter.csv ***
id, coder, Relevant, Ethnicity.Race, Ethnicity.Race..key.words.or.phrases., Religion, Religion..key.words.or.phrases., Nationalism, Nationalism..key.words.or.phrases., Partisan, Partisan..key.words.or.phrases.: Same as in dataset intercodergab.csv


*** compareLLMs.csv ***
AD_Logit__Relevant: Prediction for relevance from the combination of baseline logistic model (RelevantLogit) and advertisement indicator (ads_dict)
Human__Relevant, Human__Racist, Human__Nationalist, Human__Religious, Human__Gender, Human__Partisan: Human-coded labels for each category
LlaMa__, Keyword__, Gpt 3.5 turbo-0125__, Gpt 4 turbo__, Gpt 4o__, Gemini 1.5-flash__, Mistral Small__, Mistral Large__ (for each paired category): Model-predicted binary outputs for relevance and ideological categories — Racist, Nationalist, Religious, Gender, Partisan — from different automated coding methods (LLMs and baseline classifiers)

*** panelbottom25leader.csv ***date: Date of observation in the panel dataset
theme: Topical category of content (e.g., nationalism, gender, religion)
follower: Proportion of posts on the given theme made by follower accounts on that date
leader: Proportion of posts on the given theme made by leader accounts on that date


*** folder: dictionaries ***
Keywords used in the dictionary classification method